Details of the input data

First group of samples (to be referred to as control in the rest of the report)

Sample Names:
control1
control2
control3
control4
control5
control6
control7
control8
control9

Second group of samples (to be referred to as treatment in the rest of the report)

Sample Names:
treat1
treat2
treat3
treat4
treat5
treat6
treat7
treat8
treat9

Note: A positive log fold change shows higher expression in the treatment group; a negative log fold change represents higher expression in the control group.

Data quality control (QC)

Correlation between samples:

Here we show scatterplots comparing expression levels for all genes between the different samples, for i) all controls, ii) all treatment samples and iii) for all samples together.

These plots will only be produced when the total number of samples to compare within a group is less than or equal to 10.

Correlation between control samples:

Replicates within the same group tend to have Pearson correlation coefficients >= 0.96. Lower values may indicate problems with the samples.

Correlation between treatment samples:

Replicates within the same group tend to have Pearson correlation coefficients >= 0.96. Lower values may indicate problems with the samples.

Heatmap and clustering showing correlation between replicates

BROWN: higher correlation; YELLOW: lower

Principal Component Analysis

This is a PCA plot of the count values following rlog normalization from the DESeq2 package:

The samples are shown in the 2D plane and distributed by their first two principal components. This type of plot is useful for visualizing the overall effect of experimental covariates and batch effects. It is also useful for identifying outlier samples. Control and treatment samples respectively may cluster together.

Visualizing normalization results

These boxplots show the distributions of count data before and after normalization (shown for normalization method DESeq2):

Representation of cpm unfiltered data:

Before normalization:

After normalization:

Gene counts variance distribution

Variance of gene counts across samples are represented. Genes with lower variance than selected threshold (dashed grey line) were filtered out.

Samples differences by all counts normalized:

All counts were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.

Percentages of reads per sample mapping to the most highly expressed genes

control1 control2 control3 control4 control5 control6 control7 control8 control9 treat1 treat2 treat3 treat4 treat5 treat6 treat7 treat8 treat9
ENSG00000115414 4.479 6.406 1.618 7.148 6.641 2.878 3.386 4.822 1.410 4.599 3.736 2.027 6.358 4.221 2.802 4.552 4.778 3.282
ENSG00000108821 2.857 3.723 2.892 3.823 4.358 3.803 2.346 3.076 2.493 2.725 2.803 3.254 2.622 2.876 4.014 1.899 2.871 4.262
ENSG00000276168 1.737 1.380 3.442 2.854 1.933 1.763 1.882 1.607 3.066 1.930 1.487 2.682 1.637 1.997 1.849 2.059 1.842 1.708
ENSG00000164692 0.992 1.490 1.302 1.397 1.729 1.663 0.770 1.241 1.112 1.019 1.125 1.495 1.018 1.195 1.737 0.965 1.658 2.127
ENSG00000168542 0.410 0.686 0.947 0.667 0.648 1.006 0.343 0.601 0.758 0.305 0.588 1.045 0.264 0.538 1.086 0.188 0.537 1.163

DEgenes Hunter results

Gene classification by DEgenes Hunter

DEgenes Hunter uses multiple DE detection packages to analyse all genes in the input count table and labels them accordingly:

  • Filtered out: Genes discarded during the filtering process as showing no or very low expression.
  • Prevalent DEG: Genes considered as differentially expressed (DE) by at least 4 packages, as specified by the minpack_common argument.
  • Possible DEG: Genes considered DE by at least one of the DE detection packages.
  • Not DEG: Genes not considered DE in any package.

This barplot shows the total number of genes passing each stage of analysis - from the total number of genes in the input table of counts, to the genes surviving the expression filter, to the genes detected as DE by one package, to the genes detected by at least 4 packages.

Package DEG detection stats

This is the Venn Diagram of all possible DE genes (DEGs) according to at least on of the DE detection packages employed:

Plot showing variability between different DEG detection methods in terms of logFC calculation

This graph shows logFC calculated (y-axis) for each package (points) and gene (x-axis). Only genes with variability over 0.01 will be plotted. This representation allows to user to observe the behaviour of each DE package and see if one of them has atypical results.

If there are no genes showing sufficient variance in estimated logFC accross methods, no plot will be produced and a warning message will be given.

FDR gene-wise benchmarking

Benchmark of false positive calling:

Boxplot of FDR values among all genes with an FDR <= 0.05 in at least one DE detection package

**FDR Volcano Plot showing log 2 fold change vs. FDR The red horizontal line represents the chosen FDR threshold of opt$p_val_cutoff. The black lines represent other values.

The complete results of the DEgenes Hunter differential expression analysis can be found in the “hunter_results_table.txt” file in the Common_results folder

DE detection package specific results

Various plots specific to each package are shown below:

DESeq2 normalization effects:

This plot compares the effective library size with raw library size

The effective library size is the factor used by DESeq2 normalizatioin algorithm for eahc sample. The effective library size must be dependent of raw library size.

DESeq2 MA plot:

This is the MA plot from DESeq2 package:

In DESeq2, the MA-plot (log ratio versus abundance) shows the log2 fold changes are attributable to a given variable over the mean of normalized counts. Points will be colored red if the adjusted Pvalue is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down.

A table containing the DESeq2 DEGs is provided: in Results_DESeq2/DEgenes_DESEq2.txt

A table containing the DESeq2 normalized counts is provided in Results_DESeq2/Normalized_counts_DESEq2.txt

Differences between samples by PREVALENT DEGs normalized counts:

Counts of prevalent DEGs were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.

edgeR MA plot

This is the MA plot from edgeR package:

Differential gene expression data can be visualized as MA-plots (log ratio versus abundance) where each dot represents a gene. The differentially expressed genes are colored red and the non-differentially expressed ones are colored black.

A table containing the edgeR DEGs is provided in Results_edgeR/DEgenes_edgeR.txt

A table containing the edgeR normalized counts is provided in Results_edgeR/Normalized_counts_edgeR.txt

limma Volcano plot

Volcano plot of log2-fold change versus -log10 of adjusted p-values for all genes according to the analysis with limma:

A table containing the limma DEGs is provided in Results_limma/DEgenes_limma.txtA table containing the limma normalized counts is provided in Results_limma/Normalized_counts_limma.txt

NOISeq Expressionplot

This is the summary plot for (M,D) values (black) and the differentially expressed genes (red) from the NOISeq package (Image extracted from {‘ExpressionPlot.pdf’} file):

A table containing the NOISeq DEGs is provided in Results_NOISeq/DEgenes_NOISeq.txt.

A table containing the NOISeq normalized counts is provided in Results_NOISeq/Normalized_counts_NOISeq.txt

WGCNA Results

WGCNA was run to look for modules (clusters) of coexpressed genes. These modules were then compared with the sample factors to look for correlation. If no sample factors were specified, this comparison was performed with treatment/control labels.

The following graphic shows the power value chosen for building clusters. The power is chosen by looking at the characteristics of the network produced.

In total there were 36 clusters. The following plot shows the number of genes per cluster:

Module Membership distribution

Cluster assignment vs lower module membership (MM)

This plot shows, for each gene, the cluster ID ascertained by WGCNA, vs. the cluster whose eigen gene has the highest correlation (module membership/MM).

Cluster vs. factors correlation

The following plots show the correlation between the different modules and specified factors. This is done using eigengenes, which can be broadly thought of as the average expression pattern for the genes in a given cluster. MEn refers to the eigengene for cluster n.

This plot shows the correlation between clusters (eigen genes) and factors directly.

WGCNA Eigen values clustering

WGCNA dendogram showing distances between these eigengenes along with the factors. Distances has been calculated using signed correlation so more near elements, more positive correlation between elements.

Eigen values clustering (Absolute correlation)

WGCNA like dendogram showing distances between these eigengenes along with the factors. Distances has been calculated using absolute correlation so more near elements, more absolute correlation between elements.

Correlation network between modules and factors

This plot shows modules (black) and factors (green) as nodes. Correlations coefficients over 0.8 (red) and under -0.8 (blue) are represented as edges

Correlation between all clusters and factors

Detailed package results comparation

This is an advanced section in order to compare the output of the packages used to perform data analysis. The data shown here does not necessarilly have any biological implication.

P-value Distributions

Distributions of p-values, unadjusted and adjusted for multiple testing (FDR)

FDR Correlations

Correlations of adjusted p-values, adjusted for multiple testing (FDR) and for log Fold Change.

Values of options used to run DEGenesHunter

First column contains the option names; second column contains the given values for each option in this run.

opt
input_file /mnt2/fscratch/users/bio_267_uma/elenarojano/NGS_projects/DMD_ELA/DMD_nextseq500_paired/results/DEGenesHunter_results/control_vs_DMD/final_counts.txt
reads 2
count_var_quantile 0
minlibraries 2
filter_type separate
output_files /mnt2/fscratch/users/bio_267_uma/elenarojano/NGS_projects/DMD_ELA/DMD_nextseq500_paired/results/DEGenesHunter_results/control_vs_DMD
p_val_cutoff 0.05
lfc 1
modules WDELN
minpack_common 4
target_file /mnt/home/users/bio_267_uma/elenarojano/projects/tfms/hurtadoLucia/analysis/DMD_nextseq500_paired/DEG_workflow/TARGETS/control_vs_DMD_target.txt
model_variables
numerics_as_factors TRUE
string_factors
numeric_factors
WGCNA_memory 5000
WGCNA_norm_method DESeq2
WGCNA_deepsplit 2
WGCNA_min_genes_cluster 30
WGCNA_detectcutHeight 0.995
WGCNA_mergecutHeight 0.25
WGCNA_all FALSE
WGCNA_blockwiseNetworkType signed
WGCNA_blockwiseTOMType signed
WGCNA_minCoreKME 0.7
WGCNA_minKMEtoStay 0.5
WGCNA_corType pearson
multifactorial
help FALSE